Robust Speaker Recognition in Unknown Noisy Conditions
نویسندگان
چکیده
This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the Internet. While the technologies promise an additional biometric layer of security to protect the user, the practical implementation of such systems faces many challenges. One of these is environmental noise. Due to the mobile nature of such systems, the noise sources can be highly time-varying and potentially unknown. This raises the requirement for noise robustness in the absence of information of the noise. This paper describes a method, named universal compensation (UC), that combines multi-condition training and the missing-feature method to model noises with unknown temporal-spectral characteristics. Multi-condition training is conducted using simulated noisy data with limited noise varieties, providing a “coarse” compensation for the noise, and the missing-feature method refines the compensation by ignoring noise variations outside the given training conditions, thereby reducing the training and testing mismatch. This paper is focused on several issues relating to the implementation of the UC model for real-world applications. These include the generation of multi-condition training data to model real-world noisy speech, the combination of different training data to optimize the recognition performance, and the reduction of the model’s complexity. Two databases were used to test the UC algorithm. The first is a re-development of the TIMIT database by re-recording the data in the presence of various noises, used to test the model for speaker identification with a focus on the noise varieties. The second is a handheld-device database collected in realistic noisy conditions, used to further validate the model on the real-world data for speaker verification. The new model was compared to baseline systems and has shown improved identification and verification performance. J. Ming is with the School of Electrical Engineering and Computer Science, Queen’s University Belfast, Belfast BT7 1NN, U.K. (e-mail: [email protected], phone: 44-28-90974723; fax: 44-28-90975666). T. J. Hazen and J. R. Glass are with the MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139, U.S.A. (e-mail: hazen/[email protected]; phone: 1-617-253-4672/1640; fax: 1-617-258-8642). D. A. Reynolds is with the MIT Lincoln Laboratory, Lexington, MA 02420, U.S.A. (email: [email protected]; phone: 1-781981-4494; fax: 1-781-981-0186). The work was sponsored in part by Intel Corporation, and in part by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government. November 10, 2005 DRAFT
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملPitch maxima for robust speaker recognition
This paper presents a novel approach to the design of a robust speaker recognition system. A noise-free synthesised spectrum is produced from a noisy spectrum. This synthesised spectrum is used for feature extraction. From noisy speech, the pitch is extracted using arobust pitch estimation algorithm. This also helps in identifying the voiced segments of speech which are the only ones considered...
متن کاملNoise-robust multi-stream fusion for text-independent speaker authentication
Multi-stream approaches have proven to be very successful in speech recognition tasks and to a certain extent in speaker authentication tasks. In this study we propose a noiserobust multi-stream text-independent speaker authentication system. This system has two steps: first train the stream experts under clean conditions and then train the combination mechanism to merge the scores of the strea...
متن کاملAcoustic factor analysis based universal background model for robust speaker verification in noise
The Universal Background Model (UBM) is known as a speaker independent Gaussian Mixture Model (GMM) trained on a large speech corpus containing many speakers’ recordings in various conditions. When noisy test data is involved, UBM trained on clean data is generally not optimal. Using noisy data for UBM training, however, creates a bias towards the specific development noise samples resulting in...
متن کاملRobust speaker recognition using microphone arrays
This paper investigates the use of microphone arrays in handsfree speaker recognition systems. Hands-free operation is preferable in many potential speaker recognition applications, however obtaining acceptable performance with a single distant microphone is problematic in real noise conditions. A possible solution to this problem is the use of microphone arrays, which have the capacity to enha...
متن کاملNoise robust speaker verification with delta cepstrum normalization
This paper introduces a delta cepstrum normalization (DCN) technique for speaker verification under noisy conditions. Cepstral feature normalization techniques are widely used to mitigate spectral variations caused by various types of noise; however, little attention has been paid to normalizing delta features. A DCN technique that normalizes not only base features but also delta-features was r...
متن کامل